A fuzzy monster in a beret and scarf, critiquing their own column graph on a canvas in front of them while other assistant monsters (also in berets) carry over boxes full of elements that can be used to customize a graph (like themes and geometric shapes). In the background is a wall with framed data visualizations. Stylized text reads “ggplot2: build a data masterpiece.

Figure from Allison Horst

Introduction

The very popular R package ggplot2 is based on a system called the Grammar of Graphics by Leland Wilkinson which aims to create a grammatical rules for the development of graphics. It is part of a larger group of packages called “the tidyverse.”

What is the tidyverse?

The package ggplot2 is a part of a larger collection of packages called “the tidyverse” that are designed for data science. You can certainly use R without using the tidyverse, but it has many packages that I think will make your life a lot easier.

We can install just ggplot2 or install all of the packages in the core tidyverse (which is what I’d recommend since we will use the others too), which include:

  • dplyr: for data manipulation
  • ggplot2: a “grammar of graphics” for creating beautiful plots
  • readr: for reading in rectangular data (i.e., Excel-style formatting)
  • tibble: using tibbles as modern/better dataframes
  • stringr: handling strings (i.e., text or stuff in quotes)
  • forcats: for handling categorical variables (i.e., factors) (meow!)
  • tidyr: to make “tidy data”
  • purrr: for enhancing functional programming (also meow!)

We will be using many of these other packages in this course, but will talk about them as we go. There are more tidyverse packages outside of these core eight, and we will talk about some of them another time.

tl;dr Tidyverse has a lot of packages that make data analysis easier. None of them are required, but I think you’ll find many tidyverse approaches easier and more intuitive than using base R.

You can find here some examples of comparing tidyverse and base R syntax.

Installing ggplot & tidyverse

To install packages in R that are on the Comprehensive R Archive Network (CRAN), you can use the function install.packages().

install.packages("tidyverse")
install.packages("ggplot2")

We only need to install packages once. But, every time we want to use them, we need to “load” them, and can do this using the function library().

tl:dr install.packages() once, library() every time.

ggplot

The “gg” in ggplot stands for “grammar of graphics” and all plots share a common template. This is fundamentally different than plotting using a program like Excel, where you first pick your plot type, and then you add your data. With ggplot, you start with data, add a coordinate system, and then add “geoms,” which indicate what type of plot you want. A cool thing about ggplot is that you can add and layer different geoms together, to create a fully customized plot that is exactly what you want. If this sounds nebulous right now, that’s okay, we are going to talk more about this.

A group of fuzzy round monsters with binoculars, backpacks and guide books looking up a graphs flying around with wings (like birders, but with exploratory data visualizations). Stylized text reads “ggplot2: visual data exploration.”

Figure from Allison Horst

In class

In class, we will practice using ggplot